Skip to content

btrfs receive: add --fsync and --syncfs options#1110

Closed
mge-fbe-com wants to merge 1 commit intokdave:develfrom
mge-fbe-com:receive_sync
Closed

btrfs receive: add --fsync and --syncfs options#1110
mge-fbe-com wants to merge 1 commit intokdave:develfrom
mge-fbe-com:receive_sync

Conversation

@mge-fbe-com
Copy link
Copy Markdown

Summary

Encoded writes were silently ignoring write errors. This is fixed on the kernel side in https://lore.kernel.org/lkml/20260330160644.3678224-1-mge@meta.com/. This change augments that fix with optional fsync/syncfs support so users can enable strict checks on data write durability.

New flags:

  • --fsync — fsyncs each file before closing during receive, ensuring data is persisted and detecting async I/O errors early at the individual file level. Higher I/O overhead but pinpoints the exact file that failed.
  • --syncfs — calls syncfs() on the destination filesystem after receiving, catching any remaining unflushed writeback errors in a single operation. Lower overhead but only reports errors at the end.

Both flags require the kernel-side fix to be effective — on unfixed kernels, encoded write errors are silently dropped and never flagged, so neither fsync nor syncfs will detect them.

Test plan

Tested with a dm-flakey device returning write errors after 100MB offset on a 2GB device on a kernel with the fix.

Without flags — error is only caught late, at the final BTRFS_IOC_SET_RECEIVED_SUBVOL ioctl:
=== Test: no sync flags ===
At subvol volume
ERROR: ioctl BTRFS_IOC_SET_RECEIVED_SUBVOL failed: Input/output error

With --syncfs — same late error, plus syncfs also reports the failure:
=== Test: with --syncfs ===
At subvol volume
ERROR: ioctl BTRFS_IOC_SET_RECEIVED_SUBVOL failed: Input/output error
ERROR: syncfs on destination failed: Input/output error

With --fsync — error is caught early at the specific file that failed:
=== Test: with --fsync ===
At subvol volume
ERROR: fsync on /tmp/flakey-mnt//volume/.meta/private/opts/artifacts_may_require_repo failed: Input/output error

Test script (dm-flakey setup)
#!/bin/bash
set -euo pipefail

# Creates a device that works normally for the first N MB
# and returns I/O errors for all writes beyond that.
# Uses dm-linear + dm-flakey to simulate partial disk failure.
#
# Usage: sudo ./flakey.sh <sendstream> [size] [good_size_mb]
# Cleanup: sudo ./flakey.sh --cleanup

SENDSTREAM="${1:-}"
if [ -z "$SENDSTREAM" ] && [ "${1:-}" != "--cleanup" ]; then
    echo "Usage: $0 <sendstream> [size] [good_size_mb]" >&2
    echo "       $0 --cleanup" >&2
    exit 1
fi

LOOP_FILE="${LOOP_FILE:-/tmp/flakey-disk.img}"
TOTAL_SIZE="${2:-1G}"
GOOD_MB="${3:-200}"
DM_NAME="flakey-combo"
MNT="${MNT:-/tmp/flakey-mnt}"

cleanup() {
    echo "Cleaning up..."
    umount "$MNT" 2>/dev/null || true
    dmsetup remove "$DM_NAME" 2>/dev/null || true
    if [ -f "$LOOP_FILE" ]; then
        local loop
        loop=$(losetup -j "$LOOP_FILE" 2>/dev/null | cut -d: -f1)
        [ -n "$loop" ] && losetup -d "$loop" 2>/dev/null || true
    fi
    rm -f "$LOOP_FILE"
    rmdir "$MNT" 2>/dev/null || true
    echo "Done."
}

if [ "${1:-}" = "--cleanup" ]; then
    cleanup
    exit 0
fi

[ "$(id -u)" -ne 0 ] && { echo "Must run as root" >&2; exit 1; }

trap cleanup EXIT

# Create backing file and loop device
echo "Creating ${TOTAL_SIZE} backing file..."
truncate -s "$TOTAL_SIZE" "$LOOP_FILE"
LOOP=$(losetup --find --show "$LOOP_FILE")
TOTAL_SECTORS=$(blockdev --getsz "$LOOP")
GOOD_SECTORS=$(( GOOD_MB * 1024 * 1024 / 512 ))
BAD_SECTORS=$(( TOTAL_SECTORS - GOOD_SECTORS ))

echo "Loop device: $LOOP"
echo "Total sectors: $TOTAL_SECTORS ($(( TOTAL_SECTORS * 512 / 1024 / 1024 ))MB)"
echo "Good sectors: $GOOD_SECTORS (${GOOD_MB}MB)"
echo "Bad sectors:  $BAD_SECTORS ($(( BAD_SECTORS * 512 / 1024 / 1024 ))MB)"

# Format btrfs on the reliable device
echo ""
echo "Formatting btrfs on reliable device..."
mkfs.btrfs -f "$LOOP"

# Create dm device: first N MB linear (ok), rest flakey (error_writes)
echo "Creating dm device (first ${GOOD_MB}MB ok, rest errors)..."
dmsetup create "$DM_NAME" <<EOF
0 $GOOD_SECTORS linear $LOOP 0
$GOOD_SECTORS $BAD_SECTORS flakey $LOOP $GOOD_SECTORS 0 1 1 error_writes
EOF

echo "Device: /dev/mapper/$DM_NAME"
mkdir -p "$MNT"
mount "/dev/mapper/$DM_NAME" "$MNT"
echo "Mounted at $MNT"
echo ""
echo "First ${GOOD_MB}MB: normal I/O"
echo "After ${GOOD_MB}MB: all writes return -EIO"

run_test() {
    local label="$1"
    shift
    echo ""
    echo "=== Test: $label ==="
    echo "Command: /root/btrfs receive $* -f $SENDSTREAM $MNT"
    # Clean mount between tests
    umount "$MNT" 2>/dev/null || true
    # Temporarily make device reliable for mkfs
    dmsetup suspend "$DM_NAME"
    dmsetup load "$DM_NAME" --table "0 $TOTAL_SECTORS linear $LOOP 0"
    dmsetup resume "$DM_NAME"
    mkfs.btrfs -f "/dev/mapper/$DM_NAME" >/dev/null 2>&1
    mount "/dev/mapper/$DM_NAME" "$MNT"
    # Switch back to flakey
    sync
    dmsetup suspend "$DM_NAME"
    dmsetup load "$DM_NAME" <<DMEOF
0 $GOOD_SECTORS linear $LOOP 0
$GOOD_SECTORS $BAD_SECTORS flakey $LOOP $GOOD_SECTORS 0 1 1 error_writes
DMEOF
    dmsetup resume "$DM_NAME"
    /root/btrfs receive -v "$@" -f "$SENDSTREAM" "$MNT"
    echo "Exit code: $?"
    return 0
}

run_test "no sync flags"
run_test "with --syncfs" --syncfs
run_test "with --fsync" --fsync

./flakey.sh ./sendstream 2G 100
Creating 2G backing file...
Loop device: /dev/loop1
Total sectors: 4194304 (2048MB)
Good sectors: 204800 (100MB)
Bad sectors:  3989504 (1948MB)

Formatting btrfs on reliable device...
btrfs-progs v6.7
See https://btrfs.readthedocs.io for more information.

Performing full device TRIM /dev/loop1 (2.00GiB) ...
NOTE: several default settings have changed in version 5.15, please make sure
      this does not affect your deployments:
      - DUP for metadata (-m dup)
      - enabled no-holes (-O no-holes)
      - enabled free-space-tree (-R free-space-tree)

Label:              (null)
UUID:               5844132e-feb1-452e-8942-ca8b92321b62
Node size:          16384
Sector size:        4096
Filesystem size:    2.00GiB
Block group profiles:
  Data:             single            8.00MiB
  Metadata:         DUP             102.38MiB
  System:           DUP               8.00MiB
SSD detected:       yes
Zoned device:       no
Incompat features:  extref, skinny-metadata, no-holes, free-space-tree
Runtime features:   free-space-tree
Checksum:           crc32c
Number of devices:  1
Devices:
   ID        SIZE  PATH
    1     2.00GiB  /dev/loop1

Creating dm device (first 100MB ok, rest errors)...
Device: /dev/mapper/flakey-combo
Mounted at /tmp/flakey-mnt

First 100MB: normal I/O
After 100MB: all writes return -EIO

=== Test: no sync flags ===
Command: /root/btrfs receive  -f ./sendstream /tmp/flakey-mnt
At subvol volume
ERROR: ioctl BTRFS_IOC_SET_RECEIVED_SUBVOL failed: Input/output error
Exit code: 1

=== Test: with --syncfs ===
Command: /root/btrfs receive --syncfs -f ./sendstream /tmp/flakey-mnt
At subvol volume
ERROR: ioctl BTRFS_IOC_SET_RECEIVED_SUBVOL failed: Input/output error
ERROR: syncfs on destination failed: Input/output error
Exit code: 1

=== Test: with --fsync ===
Command: /root/btrfs receive --fsync -f ./sendstream /tmp/flakey-mnt
At subvol volume
ERROR: fsync on /tmp/flakey-mnt//volume/.meta/private/opts/artifacts_may_require_repo failed: Input/output error
Exit code: 1
Cleaning up...
Done.

Add --fsync flag that fsyncs each file before closing during receive,
ensuring data is persisted and detecting async IO errors early.

Add --syncfs flag that calls syncfs() on the destination filesystem
after receiving, syncing all data in a single operation at the end.

Signed-off-by: Michal Grzedzicki <mge@meta.com>
@mge-fbe-com mge-fbe-com closed this Apr 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant